Overview

  • Who are we?
  • Who are you?
  • What is expected?
  • Why does this class exist?
  • Collection
  • Changing computing (Parallel / Cloud)
  • Course outline

Overivew

  • What is an image?
  • Where do images come from?
  • Science and Reproducibility
  • Workflows

Who are we?

Kevin Mader (mader@biomed.ee.ethz.ch)

  • CTO at 4Quant for Big Image Analytics (ETH Spin-off)
  • Lecturer at ETH Zurich
  • Postdoc in the X-Ray Microscopy Group at ETH Zurich and Swiss Light Source at Paul Scherrer Institute

Ioannis Vogiatzis (ioannis.vogiatzis@psi.ch)

  • Exercise assistance
  • PhD Student in the X-Ray Microscopy Group at ETH Zurich and Swiss Light Source at Paul Scherrer Institute

Guest Lecturers

Anders Kaestner, PhD (anders.kaestner@psi.ch)

  • Group Leader at the ICON Beamline at the SINQ (Neutron Source) at Paul Scherrer Institute

Michael Prummer, PhD (prummer@nexus.ethz.ch)

  • Biostatistician at NEXUS Personalized Health Technol.
  • Previously Senior Scientist at F. Hoffmann-La Roche Ltd., Basel, Switzerland.
  • Pharma Research & Early Development (pRED), Discovery Technologies
  • Phenotypic Drug Discovery & Target Identification.
  • Topic: High Content Screening (HCS), Image analysis, Biostatistics, Image Management System.

Javier Montoya, PhD

Research Scientist at ScopeM

Previously Research Assistant in Photogrammetric and Geodesics

  • Computer Vision
  • Machine Learning
  • Data analysis & modelling
  • Remote Sensing

Who are you?

A wide spectrum of backgrounds

  • Biomedical Engineers, Physicists, Chemists, Art History Researchers, Mechanical Engineers, and Computer Scientists

A wide range of skills

  • I think I've heard of Matlab before \(\rightarrow\) I write template C++ code and hand optimize it afterwards

So how will this ever work?

Adaptive assignments

Conceptual, graphical assignments with practical examples

  • Emphasis on chosing correct steps and understanding workflow

Opportunities to create custom implementations, plugins, and perform more complicated analysis on larger datasets if interested

  • Emphasis on performance, customizing analysis, and scalability

Course Expectations

Exercises

  • Usually 1 set per lecture
  • Optional (but recommended!)
  • Easy - using GUIs (KNIME and ImageJ) and completing Matlab Scripts (just lecture 2)
  • Advanced - Writing Python, Java, Scala, …

Science Project

  • Optional (but strongly recommended)
  • Applying Techniques to answer scientific question!
  • Ideally use on a topic relevant for your current project, thesis, or personal activities
  • or choose from one of ours (will be online, soon)
  • Present approach, analysis, and results

Literature / Useful References

General Material

  • Jean Claude, Morphometry with R
  • Online through ETHZ
  • Buy it
  • John C. Russ, “The Image Processing Handbook”,(Boca Raton, CRC Press)
  • Available online within domain ethz.ch (or proxy.ethz.ch / public VPN)
  • J. Weickert, Visualization and Processing of Tensor Fields
  • Online

Today's Material

Motivation

- To understand what, why and how from the moment an image is produced until it is finished (published, used in a report, …) - To learn how to go from one analysis on one image to 10, 100, or 1000 images (without working 10, 100, or 1000X harder)

Motivation

(Why does this class exist?)

X-Ray

  • SRXTM images at (>1000fps) → 8GB/s
  • cSAXS diffraction patterns at 30GB/s
  • Nanoscopium Beamline, 10TB/day, 10-500GB file sizes

Optical

  • Light-sheet microscopy (see talk of Jeremy Freeman) produces images → 500MB/s
  • High-speed confocal images at (>200fps) → 78Mb/s

Personal

  • GoPro 4 Black - 60MB/s (3840 x 2160 x 30fps) for $600
  • fps1000 - 400MB/s (640 x 480 x 840 fps) for $400

Motivation

  1. Experimental Design finding the right technique, picking the right dyes and samples has stayed relatively consistent, better techniques lead to more demanding scientits.

  2. Management storing, backing up, setting up databases, these processes have become easier and more automated as data magnitudes have increased

  3. Measurements the actual acquisition speed of the data has increased wildly due to better detectors, parallel measurement, and new higher intensity sources

  4. Post Processing this portion has is the most time-consuming and difficult and has seen minimal improvements over the last years

Saturating Output

Year Measurements Publications
2000 146 67
2008 584 110
2014 1031 128
2020 1081 133

To put more real numbers on these scales rather than 'pseudo-publications', the time to measure a terabyte of data is shown in minutes.

Year Time to 1 TB in Minutes
2000 4096
2008 1092
2014 32
2016 2

How much is a TB, really?

If you looked at one 1000 x 1000 sized image every second, it would take you
139 hours to browse through a terabyte of data.

Year Time to 1 TB Man power to keep up Salary Costs / Month
2000 4096 min 2 people 25 kCHF
2008 1092 min 8 people 95 kCHF
2014 32 min 260 people 3255 kCHF
2016 2 min 3906 people 48828 kCHF

Overwhelmed

  • Count how many cells are in the bone slice
  • Ignore the ones that are ‘too big’ or shaped ‘strangely’
  • Are there more on the right side or left side?
  • Are the ones on the right or left bigger, top or bottom?

More overwhelmed

  • Do it all over again for 96 more samples, this time with 2000 slices instead of just one!

Bring on the pain

  • Now again with 1090 samples!

It gets better

  • Those metrics were quantitative and could be easily visually extracted from the images
  • What happens if you have softer metrics
  • How aligned are these cells?
  • Is the group on the left more or less aligned than the right?
  • errr?

Dynamic Information

  • How many bubbles are here?
  • How fast are they moving?
  • Do they all move the same speed?
  • Do bigger bubbles move faster?
  • Do bubbles near the edge move slower?
  • Are they rearranging?

Computing has changed: Parallel

Moores Law

There are now many more transistors inside a single computer but the processing speed hasn't increased. How can this be?

  • Multiple Core
  • Many machines have multiple cores for each processor which can perform tasks independently
  • Multiple CPUs
  • More than one chip is commonly present
  • New modalities
  • GPUs provide many cores which operate at slow speed

Parallel Code is important

Computing has changed: Cloud

  • Computer, servers, workstations are wildly underused (majority are <50%)
  • Buying a big computer that sits idle most of the time is a waste of money

http://www-inst.eecs.berkeley.edu/~cs61c/sp14/ “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer, December 2007

  • Traditionally the most important performance criteria was time, how fast can it be done
  • With Platform as a service servers can be rented instead of bought
  • Speed is still important but using cloud computing $ / Sample is the real metric
  • In Switzerland a PhD student if 400x as expensive per hour as an Amazon EC2 Machine
  • Many competitors keep prices low and offer flexibility

Cloud Computing Costs

The figure shows the range of cloud costs (determined by peak usage) compared to a local workstation with utilization shown as the average number of hours the computer is used each week.

The figure shows the cost of a cloud based solution as a percentage of the cost of buying a single machine. The values below 1 show the percentage as a number. The panels distinguish the average time to replacement for the machines in months

Cloud: Equal Cost Point

Here the equal cost point is shown where the cloud and local workstations have the same cost. The x-axis is the percentage of resources used at peak-time and the y shows the expected usable lifetime of the computer. The color indicates the utilization percentage and the text on the squares shows this as the numbers of hours used in a week.

Course Overview

Lecture Description Applications
23th February - Introduction and Workflows Basic overview of the course, introduction to the basics of images and their acquisition, the importance of reproducibility and why workflows make sense for image processing Calculating the intensity for a folder full of images
2rd March - Image Enhancement (A. Kaestner) Overview of what techniques are available for assessing and improving the quality of images, specifically various filters, when to apply them, their side-effects, and how to apply them correctly Removing detector noise from neutron images to distinguish different materials
9th March - Tutorial on Python and Jupyter (TBA) An introduction to the Python world of image analysis and the scikit projects Getting familiar with Python and learning how the basic scikit tools work

Overview: Segmentation

Lecture Description Applications
16th March - Basic Segmentation, Discrete Binary Structures How to convert images into structures, starting with very basic techniques like threshold and exploring several automated techniques Identify cells from noise, background, and dust
23th March - Advanced Segmentation More advanced techniques for extracting structures including basic clustering and classification approaches, and component labeling Identifying fat and ice crystals in ice cream images

Overview: Analysis

Lecture Description Applications
30th March - Analyzing Single Objects The analysis and characterization of single structures/objects after they have been segmented, including shape and orientation Count cells and determine their average shape and volume
6th April - Analyzing Complex Objects What techniques are available to analyze more complicated objects with poorly defined 'shape' using Distance maps, Thickness maps, and Voronoi tesselation Seperate clumps of cells, analyze vessel networks, trabecular bone, and other similar structures
13th April - Many Objects and Distributions Extracting meaningful information for a collection of objects like their spatial distribution, alignment, connectivity, and relative positioning Quantify cells as being evenly spaced or tightly clustered or organized in sheets

Overview: Big Imaging

Lecture Description Applications
27th April - Statistics and Reproducibility Making a statistical analysis from quantified image data, and establishing the precision of the metrics calculated, also more coverage of the steps to making an analysis reproducible Determine if/how different a cancerous cell is from a healthly cell properly
4th May - Dynamic Experiments Performing tracking and registration in dynamic, changing systems covering object and image based methods Turning a video of foam flow into metrics like speed, average deformation, and reorganization
11th May - Scaling Up / Big Data Performing large scale analyses on clusters and cloud-based machines and an introduction of how to work with 'big data' frameworks Performing large scale analyses using ETHs clusters and Amazons Cloud Resources, how to do anything with a terabytes of data

Overview: Wrapping Up

Lecture Description Applications
18th May - Guest Lecture - High Content Screening (M. Prummer) / Project Presentations How Roche does Microscopy at Scale with High Content Screening and what the important image analysis aspects are Robust analysis of millions of images for making decisions about pharmaceuticals to pursue
1st June - Guest Lecture - Big Aerial Images with Deep Learning and More Advanced Approaches (J. Montoya) Applying more advanced techniques from the field of Machine Learning to image processing segmentation and analysis of aerial images specifically Support vector machines (SVM) and Markov Random Fields (MRF) Identifying houses, streets, and cars in satellite images
NA NA NA

What is an image?

A very abstract definition: A pairing between spatial information (position) and some other kind of information (value).

In most cases this is a 2 dimensional position (x,y coordinates) and a numeric value (intensity)

x y Intensity
1 1 28
2 1 13
3 1 40
4 1 49
5 1 18
1 2 47

This can then be rearranged from a table form into an array form and displayed as we are used to seeing images

2D Intensity Images

The next step is to apply a color map (also called lookup table, LUT) to the image so it is a bit more exciting

Which can be arbitrarily defined based on how we would like to visualize the information in the image

Lookup Tables

Formally a lookup table is a function which \[ f(\textrm{Intensity}) \rightarrow \textrm{Color} \]

These transformations can also be non-linear as is the case of the graph below where the mapping between the intensity and the color is a \(\log\) relationship meaning the the difference between the lower values is much clearer than the higher ones

Applied LUTs

On a real image the difference is even clearer

3D Images

For a 3D image, the position or spatial component has a 3rd dimension (z if it is a spatial, or t if it is a movie)

x y z Intensity
1 1 1 67
2 1 1 100
3 1 1 69
1 2 1 72
2 2 1 63
3 2 1 34

This can then be rearranged from a table form into an array form and displayed as a series of slices

Multiple Values

In the images thus far, we have had one value per position, but there is no reason there cannot be multiple values. In fact this is what color images are (red, green, and blue) values and even 4 channels with transparency (alpha) as a different. For clarity we call the dimensionality of the image the number of dimensions in the spatial position, and the depth the number in the value.

x y Intensity Transparency
1 1 51 49
2 1 52 40
3 1 44 7
4 1 40 35
5 1 25 43
1 2 19 57

This can then be rearranged from a table form into an array form and displayed as a series of slices

Hyperspectral Imaging

At each point in the image (black dot), instead of having just a single value, there is an entire spectrum. A selected group of these (red dots) are shown to illustrate the variations inside the sample. While certainly much more complicated, this still constitutes and image and requires the same sort of techniques to process correctly.

Image Formation

  • Impulses Light, X-Rays, Electrons, A sharp point, Magnetic field, Sound wave
  • Characteristics Electron Shell Levels, Electron Density, Phonons energy levels, Electronic, Spins, Molecular mobility
  • Response Absorption, Reflection, Phase Shift, Scattering, Emission
  • Detection Your eye, Light sensitive film, CCD / CMOS, Scintillator, Transducer

Where do images come from?

Various modalities and their ways of being recorder
Modality Impulse Characteristic Response Detection
Light Microscopy White Light Electronic interactions Absorption Film, Camera
Phase Contrast Coherent light Electron Density (Index of Refraction) Phase Shift Phase stepping, holography, Zernike
Confocal Microscopy Laser Light Electronic Transition in Fluorescence Molecule Absorption and reemission Pinhole in focal plane, scanning detection
X-Ray Radiography X-Ray light Photo effect and Compton scattering Absorption and scattering Scintillator, microscope, camera
Ultrasound High frequency sound waves Molecular mobility Reflection and Scattering Transducer
MRI Radio-frequency EM Unmatched Hydrogen spins Absorption and reemission RF coils to detect
Atomic Force Microscopy Sharp Point Surface Contact Contact, Repulsion Deflection of a tiny mirror

Acquiring Images

Traditional / Direct imaging

  • Visible images produced or can be easily made visible
  • Optical imaging, microscopy
 here the measurement is supposed to be from a typical microscope which blurs, flips and otherwise distorts the image but the original representation is still visible

here the measurement is supposed to be from a typical microscope which blurs, flips and otherwise distorts the image but the original representation is still visible

Indirect / Computational imaging

  • Recorded information does not resemble object
  • Response must be transformed (usually computationally) to produce an image
here the measurement is supposed to be from a diffraction style experiment where the data is measured in reciprocal space (fourier) and can be reconstructed to the original shape

here the measurement is supposed to be from a diffraction style experiment where the data is measured in reciprocal space (fourier) and can be reconstructed to the original shape

Traditional Imaging

Copyright 2003-2013 J. Konrad in EC520 lecture, reused with permission

Traditional Imaging: Model

\[ \left[\left([b(x,y)*s_{ab}(x,y)]\otimes h_{fs}(x,y)\right)*h_{op}(x,y)\right]*h_{det}(x,y)+d_{dark}(x,y) \]

\(s_{ab}\) is the only information you are really interested in, so it is important to remove or correct for the other components

For color (non-monochromatic) images the problem becomes even more complicated \[ \int_{0}^{\infty} {\left[\left([b(x,y,\lambda)*s_{ab}(x,y,\lambda)]\otimes h_{fs}(x,y,\lambda)\right)*h_{op}(x,y,\lambda)\right]*h_{det}(x,y,\lambda)}\mathrm{d}\lambda+d_{dark}(x,y) \]

Indirect Imaging (Computational Imaging)

  • Tomography through projections
  • Microlenses (Light-field photography)
  • Diffraction patterns
  • Hyperspectral imaging with Raman, IR, CARS
  • Surface Topography with cantilevers (AFM)

Image Analysis

  • An image is a bucket of pixels.
  • How you choose to turn it into useful information is strongly dependent on your background

Image Analysis: Experimentalist

Problem-driven

Top-down

Reality Model-based

Examples

  • cell counting
  • porosity

Image Analysis: Computer Vision Approaches

  • Method-driven
  • Feature-based
  • Image Model-based
  • Engineer features for solving problems

Examples

  • edge detection
  • face detection

Image Analysis: Deep Learning Approach

  • Results-driven
  • Biology ‘inspired’
  • Build both image processing and analysis from scratch

Examples

  • Captioning images
  • Identifying unusual events

On Science

What is the purpose?

  • Discover and validate new knowledge

How?

  • Use the scientific method as an approach to convince other people
  • Build on the results of others so we don't start from the beginning

Important Points

  • While qualitative assessment is important, it is difficult to reliably produce and scale
  • Quantitative analysis is far from perfect, but provides metrics which can be compared and regenerated by anyone

Inspired by: imagej-pres

Science and Imaging

Images are great for qualitative analyses since our brains can quickly interpret them without large programming investements.

Proper processing and quantitative analysis is however much more difficult with images.

  • If you measure a temperature, quantitative analysis is easy, \(50K\).
  • If you measure an image it is much more difficult and much more prone to mistakes, subtle setup variations, and confusing analyses

Furthermore in image processing there is a plethora of tools available

  • Thousands of algorithms available
  • Thousands of tools
  • Many images require multi-step processing
  • Experimenting is time-consuming

Why quantitative?

Human eyes have issues

Which center square seems brighter?

Are the intensities constant in the image?
## Reproducibility
Science demands repeatability! and really wants reproducability - Experimental conditions can change rapidly and are difficult to make consistent - Animal and human studies are prohibitively time consuming and expensive to reproduce - Terabyte datasets cannot be easily passed around many different groups - Privacy concerns can also limit sharing and access to data
  • Science is already difficult enough
  • Image processing makes it even more complicated
  • Many image processing tasks are multistep, have many parameters, use a variety of tools, and consume a very long time

How can we keep track of everything for ourselves and others?

  • We can make the data analysis easy to repeat by an independent 3rd party

Soup/Recipe Example

Simple Soup

Easy to follow the list, anyone with the right steps can execute and repeat (if not reproduce) the soup

  1. Buy {carrots, peas, tomatoes} at market
  2. then Buy meat at butcher
  3. then Chop carrots into pieces
  4. then Chop potatos into pieces
  5. then Heat water
  6. then Wait until boiling then add chopped vegetables
  7. then Wait 5 minutes and add meat

More complicated soup

Here it is harder to follow and you need to carefully keep track of what is being performed

Steps 1-4

  1. then Mix carrots with potatos \(\rightarrow mix_1\)
  2. then add egg to \(mix_1\) and fry for 20 minutes
  3. then Tenderize meat for 20 minutes
  4. then add tomatoes to meat and cook for 10 minutes \(\rightarrow mix_2\)
  5. then Wait until boiling then add \(mix_1\)
  6. then Wait 5 minutes and add \(mix_2\)

Using flow charts / workflows

Simple Soup

Complicated Soup

Workflows

Clearly a linear set of instructions is ill-suited for even a fairly easy soup, it is then even more difficult when there are dozens of steps and different pathsways

Furthermore a clean workflow allows you to better parallelize the task since it is clear which tasks can be performed independently

Directed Acyclical Graphs (DAG)